beyond-pg-init: report PSI memory pressure to the host over vsock#7
Merged
Conversation
The host memory controller (instd) right-sizes each VM's balloon/hotplug from a distress signal. Its sharpest signal is Linux PSI (/proc/pressure/memory), which measures memory-stall time directly and catches the buffered read() cache misses a database generates — exactly the thrash a Postgres VM hits when its page cache is squeezed. But this primitive never reported any guest resource stats, so the controller was flying blind on the workload that needs it most. Add a periodic (30s) GuestResourceStats (0xA2) report carrying PSI some/full avg10, multiplexed onto the existing substrate connection alongside heartbeats and log relay. We send PSI only (disk_total omitted) so the host skips disk billing for this report. The frame is byte-compatible with instd's vsock_protocol::GuestResourceStatsPayload decoder; resource_stats_frame_is_stable pins the wire format the same way ready_frame_is_stable does, so it can't drift. Requires the guest kernel booted with psi=1 (instd sets this on the cmdline); if PSI is unavailable the reporter simply sends nothing and the controller falls back to its balloon-stat signals. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The host memory controller (instd) right-sizes each VM's balloon/hotplug from a guest distress signal. Its sharpest signal is Linux PSI (
/proc/pressure/memory), which measures memory-stall time directly — so it catches the bufferedread()cache misses a database generates, which the previousmajor_faultssignal misses. That's exactly the thrash a Postgres VM hits when its page cache is squeezed (a real incident: a Postgres VM pinned both vCPUs at 99% for 12h).But this primitive never reported any guest resource stats over vsock, so the controller was flying blind on the workload that needs it most.
What
GuestResourceStats(0xA2) report carrying PSIsome.avg10/full.avg10, multiplexed onto the existing substrate connection alongside heartbeats and log relay.disk_total_bytes) so the host skips disk billing for this report — Postgres disk usage isn't tracked here.vsock_protocol::GuestResourceStatsPayloaddecoder.resource_stats_frame_is_stablepins the wire format the same wayready_frame_is_stabledoes, so it can't silently drift.parse_memory_pressure()with a unit test;read_memory_pressure()returnsNonewhen PSI is unavailable.Companion change
The host side (wire field, shared collector for the in-repo primitives, instd ingestion, and the controller treating PSI as the primary distress signal) lands in the
beyondrepo. instd setspsi=1on the guest kernel cmdline (the kernel already shipsCONFIG_PSI=y), so no kernel rebuild is needed. If PSI is unavailable the reporter sends nothing and the controller falls back to its balloon-stat signals.Test
cargo test -p beyond-pg-init substrate—ready_frame_is_stable,resource_stats_frame_is_stable,parses_psi_memoryall pass.🤖 Generated with Claude Code